12 research outputs found

    A Game-Theoretic Approach for Runtime Capacity Allocation in MapReduce

    Get PDF
    Nowadays many companies have available large amounts of raw, unstructured data. Among Big Data enabling technologies, a central place is held by the MapReduce framework and, in particular, by its open source implementation, Apache Hadoop. For cost effectiveness considerations, a common approach entails sharing server clusters among multiple users. The underlying infrastructure should provide every user with a fair share of computational resources, ensuring that Service Level Agreements (SLAs) are met and avoiding wastes. In this paper we consider two mathematical programming problems that model the optimal allocation of computational resources in a Hadoop 2.x cluster with the aim to develop new capacity allocation techniques that guarantee better performance in shared data centers. Our goal is to get a substantial reduction of power consumption while respecting the deadlines stated in the SLAs and avoiding penalties associated with job rejections. The core of this approach is a distributed algorithm for runtime capacity allocation, based on Game Theory models and techniques, that mimics the MapReduce dynamics by means of interacting players, namely the central Resource Manager and Class Managers

    A Combined Analytical Modeling Machine Learning Approach for Performance Prediction of MapReduce Jobs in Hadoop Clusters

    Get PDF
    Nowadays MapReduce and its open source implementation, Apache Hadoop, are the most widespread solutions for handling massive dataset on clusters of commodity hardware. At the expense of a somewhat reduced performance in comparison to HPC technologies, the MapReduce framework provides fault tolerance and automatic parallelization without any efforts by developers. Since in many cases Hadoop is adopted to support business critical activities, it is often important to predict with fair confidence the execution time of submitted jobs, for instance when SLAs are established with end-users. In this work, we propose and validate a hybrid approach exploiting both queuing networks and support vector regression, in order to achieve a good accuracy without too many costly experiments on a real setup. The experimental results show how the proposed approach attains a 21% improvement in accuracy over applying machine learning techniques without any support from analytical models

    Fluid Petri Nets for the Performance Evaluation of MapReduce Applications

    Get PDF
    Big Data applications allow to successfully analyze large amounts of data not necessarily structured, though at the same time they present new challenges. For example, predicting the performance of frameworks such as Hadoop can be a costly task, hence the necessity to provide models that can be a valuable support for designers and developers. This paper provides a new contribution in studying a novel modeling approach based on fluid Petri nets to predict MapReduce jobs execution time. The experiments we performed at CINECA, the Italian supercomputing center, have shown that the achieved accuracy is within 16% of the actual measurements on average

    D-SPACE4Cloud: Towards Quality-Aware Data Intensive Applications in the Cloud

    Get PDF
    The last years witnessed a steep rise in data generation worldwide and, consequently, the widespread adoption of software solutions claiming to support data intensive applications. Competitiveness and innovation have strongly benefited from these new platforms and methodologies, and there is a great deal of interest around the new possibilities that Big Data analytics promise to make reality. Many companies currently en- gage in data intensive processes as part of their core businesses; however, fully embracing the data-driven paradigm is still cumbersome, and es- tablishing a production-ready, fine-tuned deployment is time-consuming, expensive, and resource-intensive. This situation calls for novel models and techniques to streamline the process of deployment configuration for Big Data applications. In particular, the focus in this paper is on the rightsizing of Cloud deployed clusters, which represent a cost-effective alternative to installation on premises. We propose a novel tool, inte- grated in a wider DevOps-inspired approach, implementing a parallel and distributed simulation-optimization technique that efficiently and effec- tively explores the space of alternative resource configurations, seeking the minimum cost deployment that satisfies predefined quality of service constraints. The validity and relevance of the proposed solution has been thoroughly validated in a vast experimental campaign including different applications and Big Data platforms

    Performance Prediction of Cloud-Based Big Data Applications

    Get PDF
    Big data analytics have become widespread as a means to extract knowledge from large datasets. Yet, the heterogeneity and irregular- ity usually associated with big data applications often overwhelm the existing software and hardware infrastructures. In such con- text, the exibility and elasticity provided by the cloud computing paradigm o er a natural approach to cost-e ectively adapting the allocated resources to the application’s current needs. However, these same characteristics impose extra challenges to predicting the performance of cloud-based big data applications, a key step to proper management and planning. This paper explores three modeling approaches for performance prediction of cloud-based big data applications. We evaluate two queuing-based analytical models and a novel fast ad hoc simulator in various scenarios based on di erent applications and infrastructure setups. The three ap- proaches are compared in terms of prediction accuracy, nding that our best approaches can predict average application execution times with 26% relative error in the very worst case and about 7% on average

    Performance Prediction of Deep Learning Applications Training in GPU as a Service Systems

    Get PDF
    Data analysts predict that the GPU as a Service (GPUaaS) market will grow from US700millionin2019to700 million in 2019 to 7 billion in 2025 with a compound annual growth rate of over 38% to support 3D models, animated video processing, and gaming. GPUaaS adoption will be also boosted by the use of graphics processing units (GPUs) to support Deep learning (DL) model training. Indeed, nowadays, the main cloud providers already offer in their catalogs GPU-based virtual machines pre-installed with the popular DL framework (like Torch, PyTorch, TensorFlow, and Caffe) simplifying DL model programming operations. Motivated by these considerations, this paper studies GPU-deployed neural networks (NNs) and tackles the issue of performance prediction, particularly with respect to NN training times. The proposed approach is based on machine learning and exploits two main sets of features which describe, on one hand, the network architecture and the hyper-parameters, on the other, the hardware characteristics of the target deployment. Such data enable the learning of multiple linear regression models, which, coupled with an established feature selection technique, become accurate prediction tools, with errors below 11 % on average. An extensive experimental campaign, performed both on public and in-house private cloud deployments, considers popular deep NNs used for image classification and speech transcription and shows that prediction errors remain small even when extrapolating outside the range spanned by the input data. This has important implications for the models’ applicability: in this way, it is possible to investigate the impact on the performance of different GPUaaS deployment or hardware upgrades even without conducting an empirical investigation on the specific target device or to evaluate the changes in training time when the number of inner modules in the deep neural networks varies

    Fluid petri nets for the performance evaluation of mapreduce and spark applications

    No full text
    Big Data applications allow to successfully analyze large amounts of data not necessarily structured, though at the same time they present new challenges. For example, predicting the performance of frameworks such as Hadoop and Spark can be a costly task, hence the necessity to provide models that can be a valuable support for designers and developers. Big Data systems are becoming a central force in society and the use of models can also enable the development of intelligent systems providing Quality of Service (QoS) guarantees to their users through runtime system reconfiguration. This paper provides a new contribution in studying a novel modeling approach based on fluid Petri nets to predict MapReduce and Spark applications execution time which is suitable for runtime performance prediction. Models have been validated by an extensive experimental campaign performed at CINECA, the Italian supercomputing center, and on the Microsoft Azure HDInsight data platform. Results have shown that the achieved accuracy is around 9.5% for Map Reduce and about 10% for Spark of the actual measurements on average

    Optimal Resource Allocation of Cloud-Based Spark Applications

    Get PDF
    Nowadays, the big data paradigm is consolidating its central position in the industry, as well as in society at large. Lots of applications, across disparate domains, operate on huge amounts of data and offer great advantages both for business and research. According to analysts, cloud computing adoption is steadily increasing to support big data analyses and Spark is expected to take a prominent market position for the next decade. As big data applications gain more and more importance over time and given the dynamic nature of cloud resources, it is fundamental to develop an intelligent resource management system to provide Quality of Service guarantees to end-users. This paper presents a set of run-time optimization-based resource management policies for advanced big data analytics. Users submit Spark applications characterized by a priority and by a hard or soft deadline. Optimization policies address two scenarios: i) identification of the minimum capacity to run a Spark application within the deadline; ii) re-balance of the cloud resources in case of heavy load, minimising the weighted soft deadline application tardiness. The solution relies on an initial non-linear programming model formulation and a search space exploration based on simulation-optimization procedures. Spark application execution times are estimated by relying on a gamut of techniques, including machine learning, approximated analyses, and simulation. The benefits of the approach are evaluated on Microsoft Azure HDInsight and on a private cloud cluster based on POWER8 by considering the TPC-DS industry benchmark and SparkBench. The results obtained in the first scenario demonstrate that the percentage error of the prediction of the optimal resource usage with respect to system measurement and exhaustive search is the range 4%-29% while literature-based techniques present an average error in the range 6%-63%. Moreover, in the second scenario, the proposed algorithms can address complex problems like computing the optimal redistribution of resources among tens of applications in less than a minute with an error of 8% on average. On the same considered tests, literature-based approaches obtain an average error of about 57%

    DICE H2020 Deliverable D3.9 — Final version

    No full text
    <p>The data hereby published supports the results of the European research<br> project DICE H2020 deliverable D3.9, which can be downloaded at http://www.dice-h2020.eu/deliverables/.<br> When referring to the data, please cite the above mentioned deliverable.</p
    corecore